In this case we are analyzing the Microsoft Stock Price.
We will try to find the best possible model in terms of accuracy, training with different data sizes.
Let's import our libraries to start the analysis:
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, plot_components_plotly
import plotly.express as px
import plotly
plotly.offline.init_notebook_mode()
import os, sys
path = os.getcwd()
path = os.path.dirname(path)
sys.path.append(path)
from train import train, save_model
import datetime as dt
from datetime import timedelta
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
The last years of Microsoft stock price was:
# Loading Microsoft data
microsoft = pd.read_csv('../data/microsoft.csv')
microsoft
fig = px.line(microsoft, x='ds', y='y', labels={'ds': 'Date', 'y': 'Price'},
title='Amazon Stock Price 2002-2021')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
This pattern is not new for us, because it's pretty similar to Amazon and Apple Stocks prices. Flat price to 2015, and then start a big and fast raise.
We want to train the model with all the data, 10 years and 5 years to check the differences:
# We want to predict 2021 year
microsoft['ds'] = pd.to_datetime(microsoft['ds'])
X_test = microsoft[microsoft['ds'].dt.year == 2021][['ds']]
X_test
# Full data 2001-2020
X_train_full_data = microsoft[microsoft['ds'].dt.year != 2021]
X_train_full_data
# Predictions
model = Prophet()
model.fit(X_train_full_data)
forecast = model.predict(X_test)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with Full Data')
fig.show()
# Validating predictions
val = forecast.merge(microsoft, on='ds', how='right')
val = val[['ds', 'yhat', 'y']]
val.columns = ['Date', 'Predicted Price', 'True Price']
val = val[val.Date.dt.year == 2021]
fig = px.scatter(val, x=val.Date, y=val.columns[1:],
title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
# Forecast Components
plot_components_plotly(model, forecast)
# Scores
def scores(y_true, y_pred):
print('MAE:', mean_absolute_error(y_true, y_pred))
print('RMSE', np.sqrt(mean_squared_error(y_true, y_pred)))
y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred = forecast['yhat']
scores(y_true, y_pred)
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 49.9 / val['True Price'].mean())
Our first score was 0.81, which is good. This is because the price in 2021 keeps following the trend, and our model is able to follow the price in a reasonable way. Let's try with few years of data:
# Training data - 10 years
X_train_last_ten = microsoft[(microsoft.ds.dt.year >= 2010) & (microsoft.ds.dt.year <=2020)]
X_train_last_ten
# Predictions
model = Prophet()
model.fit(X_train_last_ten)
forecast2 = model.predict(X_test)
fig = plot_plotly(model, forecast2, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with last 10 years of Data')
fig.show()
# Validating predictions
val2 = forecast2.merge(microsoft, on='ds', how='right')
val2 = val2[['ds', 'yhat', 'y']]
val2 = val2[val2.ds.dt.year == 2021]
val2.columns = ['Date', 'Predicted Price', 'True Price']
fig = px.scatter(val2, x=val2.Date, y=val2.columns[1:], labels={'ds': 'Date'},
title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
plot_components_plotly(model, forecast2)
y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred = forecast2['yhat']
scores(y_true, y_pred)
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 26.5 / val['True Price'].mean())
With half of the data we've got a pretty nice score of 0.90. The model keeps close of the price, again, because the price is following the upper trend, but our results was really good.
Maybe traning with few data can improve even more our results, let's check it:
# Training data - 5 years
X_train_last_five = microsoft[(microsoft.ds.dt.year >= 2015) & (microsoft.ds.dt.year <=2020)]
X_train_last_five
# Predictions
model = Prophet()
model.fit(X_train_last_five)
forecast3 = model.predict(X_test)
fig = plot_plotly(model, forecast3, xlabel='Date', ylabel='Price')
fig.update_layout(title='MICROSOFT stock 2021 Predictions - Model trained with last 5 years of Data')
fig.show()
# Validating predictions
val3 = forecast3.merge(microsoft, on='ds', how='right')
val3 = val3[['ds', 'yhat', 'y']]
val3 = val3[val3.ds.dt.year == 2021]
val3.columns = ['Date', 'Predicted Price', 'True Price']
fig = px.scatter(val3, x=val3.Date, y=val3.columns[1:],
title='MICROSOFT stock 2021 Predictions - Validation')
fig.update_traces(marker_size=5)
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
plot_components_plotly(model, forecast3)
# Scores
y_true = microsoft[microsoft.ds.dt.year == 2021]['y']
y_pred = forecast3['yhat']
scores(y_true, y_pred)
print('Mean Microsoft Price in 2021: $', round(val['True Price'].mean(), 2))
print(f'Score: ', 1 - 14.61 / val['True Price'].mean())
We can see here how beautiful our predictions are with 5 years of data, specially the first half of the year, with our predictions really close of the price. It's amazing how our predictions goes down in April with the price.
The score we have here is 0.94, with a 14.61 of MAE and less than 20 RMSE. Very good results.
The best performance was with 5 years of traning data.
results = pd.DataFrame(
{'MAE': [49.9, 26.5, 14.61],
'RMSE': [55.18, 32.64, 19.72],
'Train Data': ['All the data', 'Last 10 years', 'Last 5 years']})
fig = px.bar(results, x='Train Data', y=['MAE', 'RMSE'], barmode='group',
title='Train MAE: All Data vs Last 5 and 10 Years (Less is Better)')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
val['Last 5 Years'] = val2['Predicted Price']
val['Last 10 Years'] = val3['Predicted Price']
val = val.rename(columns={'Predicted Price': 'All the Data', 'y': 'True Price', 'ds': 'Date'})
fig = px.line(val, x='Date', y=val.columns[1:], title='Microsoft Stock Predictions: Train with all the Data vs Train with Last 5 and 10 years')
fig.update_layout(height=400, width=900, autosize=False, showlegend=True)
fig.show()
Let's train our final model to make future predictions in Microsoft Stocks.
model = train('microsoft', '../data/microsoft.csv', False, True, len(X_train_last_five))
# Making future predictions with the model: two years
# 1. Creating the forecast Dates
X_test_future = []
end = dt.datetime.strptime('2023-12-31', '%Y-%m-%d').date()
start = dt.datetime.strptime('2021-11-20', '%Y-%m-%d').date()
for i in range((end-start).days):
X_test_future += [(start+timedelta(i)).strftime('%Y-%m-%d')]
X_test_future = pd.DataFrame(X_test_future)
X_test_future.columns = ['ds']
X_test_future
# 2. Making predictions: 2 years
forecast = model.predict(X_test_future)
fig = plot_plotly(model, forecast, xlabel='Date', ylabel='Price')
fig.update_layout(title='Microsoft Stocks - Two Years Forecasting')
fig.show()
Finally, we save our model:
# Saving the model
save_model('../models', model, 'microsoft')
In this case, we were able to predict nicely the 2021 price of Microsoft. The price didn't change the trend, that's why we made predictions with 0.94 score.
This proves that we can make predictions over any stock, only if the price keeps the last years trend.
In the next notebook we will analyze the last stock, Tesla, which is raising their price very fast in the last months.